32 research outputs found

    Contributions to the Computational Treatment of Non-literal Language

    Get PDF
    A thesis submitted in partial ful lment of the requirements of the University of Wolverhampton for the degree of Doctor of Philosophy.Non-literal language concerns the deliberate use of language in such a way that meaning cannot be inferred through a mere literal interpretation. In this thesis, three different forms of this phenomenon are studied; namely, irony, non-compositional Multiword Expressions (MWEs), and metaphor. We start by developing models to identify ironic comments in the context of the social micro-blogging website Twitter. In these experiments, we proposed a new way to extract features based on a study of their spatial structure. The proposed model is shown to perform competitively on a standard Twitter dataset. Next, we extensively study MWEs, which are the central point of focus in this work. We start by framing the task of MWE identi fication as sequence labelling and devise experiments to see the effect of eye-tracking data in capturing formulaic MWEs using structured prediction. We also develop a novel neural architecture to speci fically address the issue of discontinuous MWEs using a combination of Graph Convolutional Neural Networks (GCNs) and self-attention. The proposed model is subsequently tested on several languages where it is shown to outperform the state-of-the-art in overall criteria and also in capturing gappy MWEs. In the final part of the thesis, we look at metaphor and its interaction with verbal MWEs. In a series of experiments, we propose a hybrid BERT-based model augmented with a novel variation of GCN where we perform classifi cation on two standard metaphor datasets using information from MWEs. This model which performs at the same level with state-of-the-art is, to the best of our knowledge, the first MWE-aware metaphor identifi cation system paving the way for further experimentation on the interaction of different types of fi gurative language.Research Group in Computational Linguistics

    Cross-lingual transfer learning and multitask learning for capturing multiword expressions

    Get PDF
    This is an accepted manuscript of an article published by Association for Computational Linguistics in Proceedings of the Joint Workshop on Multiword Expressions and WordNet (MWE-WN 2019), available online: https://www.aclweb.org/anthology/W19-5119 The accepted version of the publication may differ from the final published version.Recent developments in deep learning have prompted a surge of interest in the application of multitask and transfer learning to NLP problems. In this study, we explore for the first time, the application of transfer learning (TRL) and multitask learning (MTL) to the identification of Multiword Expressions (MWEs). For MTL, we exploit the shared syntactic information between MWE and dependency parsing models to jointly train a single model on both tasks. We specifically predict two types of labels: MWE and dependency parse. Our neural MTL architecture utilises the supervision of dependency parsing in lower layers and predicts MWE tags in upper layers. In the TRL scenario, we overcome the scarcity of data by learning a model on a larger MWE dataset and transferring the knowledge to a resource-poor setting in another language. In both scenarios, the resulting models achieved higher performance compared to standard neural approaches

    GCN-Sem at SemEval-2019 Task 1: Semantic Parsing using Graph Convolutional and Recurrent Neural Networks

    Get PDF
    This paper describes the system submitted to the SemEval 2019 shared task 1 ā€˜Cross-lingual Semantic Parsing with UCCAā€™. We rely on the semantic dependency parse trees provided in the shared task which are converted from the original UCCA files and model the task as tagging. The aim is to predict the graph structure of the output along with the types of relations among the nodes. Our proposed neural architecture is composed of Graph Convolution and BiLSTM components. The layers of the system share their weights while predicting dependency links and semantic labels. The system is applied to the CONLLU format of the input data and is best suited for semantic dependency parsing

    WLV at SemEval-2018 task 3: Dissecting tweets in search of irony

    Get PDF
    International Workshop on Semantic Evaluation. WLV at SemEval-2018 Task 3.This paper describes the systems submitted to SemEval 2018 Task 3 ā€œIrony detection in English tweetsā€ for both subtasks A and B. The first system leveraging a combination of sentiment, distributional semantic, and text surface features is ranked third among 44 teams according to the official leaderboard of the subtask A. The second system with slightly different representation of the features ranked ninth in subtask B. We present a method that entails decomposing tweets into separate parts. Searching for contrast within the constituents of a tweet is an integral part of our system. We embrace an extensive definition of contrast which leads to a vast coverage in detecting ironic content.Research Group in Computational Linguistic

    Combining Multiple Corpora for Readability Assessment for People with Cognitive Disabilities

    Get PDF
    The 12th Workshop on Innovative Use of NLP for Building Educational Applications, 8th September 2017 Copenhagen, Denmark.Given the lack of large user-evaluated corpora in disability-related NLP research (e.g. text simplification or readability assessment for people with cognitive disabilities), the question of choosing suitable training data for NLP models is not straightforward. The use of large generic corpora may be problematic because such data may not reflect the needs of the target population. At the same time, the available user-evaluated corpora are not large enough to be used as training data. In this paper we explore a third approach, in which a large generic corpus is combined with a smaller population-specific corpus to train a classifier which is evaluated using two sets of unseen user-evaluated data. One of these sets, the ASD Comprehension corpus, is developed for the purposes of this study and made freely available. We explore the effects of the size and type of the training data used on the performance of the classifiers, and the effects of the type of the unseen test datasets on the classification performance

    Using gaze data to predict multiword expressions

    Get PDF
    In recent years gaze data has been increasingly used to improve and evaluate NLP models due to the fact that it carries information about the cognitive processing of linguistic phenomena. In this paper we conduct a preliminary study towards the automatic identification of multiword expressions based on gaze features from native and non-native speakers of English. We report comparisons between a part-ofspeech (POS) and frequency baseline to: i) a prediction model based solely on gaze data and ii) a combined model of gaze data, POS and frequency. In spite of the challenging nature of the task, best performance was achieved by the latter. Furthermore, we explore how the type of gaze data (from native versus non-native speakers) affects the prediction, showing that data from the two groups is discriminative to an equal degree. Finally, we show that late processing measures are more predictive than early ones, which is in line with previous research on idioms and other formulaic structures.Na

    On the Effectiveness of Compact Biomedical Transformers

    Full text link
    Language models pre-trained on biomedical corpora, such as BioBERT, have recently shown promising results on downstream biomedical tasks. Many existing pre-trained models, on the other hand, are resource-intensive and computationally heavy owing to factors such as embedding size, hidden dimension, and number of layers. The natural language processing (NLP) community has developed numerous strategies to compress these models utilising techniques such as pruning, quantisation, and knowledge distillation, resulting in models that are considerably faster, smaller, and subsequently easier to use in practice. By the same token, in this paper we introduce six lightweight models, namely, BioDistilBERT, BioTinyBERT, BioMobileBERT, DistilBioBERT, TinyBioBERT, and CompactBioBERT which are obtained either by knowledge distillation from a biomedical teacher or continual learning on the Pubmed dataset via the Masked Language Modelling (MLM) objective. We evaluate all of our models on three biomedical tasks and compare them with BioBERT-v1.1 to create efficient lightweight models that perform on par with their larger counterparts. All the models will be publicly available on our Huggingface profile at https://huggingface.co/nlpie and the codes used to run the experiments will be available at https://github.com/nlpie-research/Compact-Biomedical-Transformers

    Cognitive processing of multiword expressions in native and non-native speakers of English: evidence from gaze data

    Get PDF
    Gaze data has been used to investigate the cognitive processing of certain types of formulaic language such as idioms and binominal phrases, however, very little is known about the online cognitive processing of multiword expressions. In this paper we use gaze features to compare the processing of verb - particle and verb - noun multiword expressions to control phrases of the same part-of-speech pattern. We also compare the gaze data for certain components of these expressions and the control phrases in order to find out whether these components are processed differently from the whole units. We provide results for both native and non-native speakers of English and we analyse the importance of the various gaze features for the purpose of this study. We discuss our findings in light of the E-Z model of reading

    Continuous patient state attention models

    Get PDF
    Irregular time-series (ITS) are prevalent in the electronic health records (EHR) as the data is recorded in EHR system as per the clinical guidelines/requirements but not for research and also depends on the patient health status. ITS present challenges in training of machine learning algorithms, which are mostly built on assumption of coherent fixed dimensional feature space. In this paper, we propose a computationally efficient variant of the transformer based on the idea of cross-attention, called Perceiver, for time-series in healthcare. We further develop continuous patient state attention models, using the Perceiver and the transformer to deal with ITS in EHR. The continuous patient state models utilise neural ordinary differential equations to learn the patient health dynamics, i.e., patient health trajectory from the observed irregular time-steps, which enables them to sample any number of time-steps at any time. The performance of the proposed models is evaluated on in-hospital-mortality prediction task on Physionet-2012 challenge and MIMIC-III datasets. The Perceiver model significantly outperforms the baselines and reduces the computational complexity, as compared with the transformer model, without significant loss of performance. The carefully designed experiments to study irregularity in healthcare also show that the continuous patient state models outperform the baselines. The code is publicly released and verified at https://codeocean.com/capsule/4587224
    corecore